K-means may perform as well as mixture model clustering but may also be much worse: comment on Steinley and Brusco (2011).

نویسنده

  • Jeroen K Vermunt
چکیده

Steinley and Brusco (2011) presented the results of a huge simulation study aimed at evaluating cluster recovery of mixture model clustering (MMC) both for the situation where the number of clusters is known and is unknown. They derived rather strong conclusions on the basis of this study, especially with regard to the good performance of K-means (KM) compared with MMC. I agree with the authors' conclusion that the performance of KM may be equal to MMC in certain situations, which are primarily the situations investigated by Steinley and Brusco. However, a weakness of the paper is the failure to investigate many important real-world situations where theory suggests that MMC should outperform KM. This article elaborates on the KM-MMC comparison in terms of cluster recovery and provides some additional simulation results that show that KM may be much worse than MMC. Moreover, I show that KM is equivalent to a restricted mixture model estimated by maximizing the classification likelihood and comment on Steinley and Brusco's recommendation regarding the use of mixture models for clustering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Commentary on Steinley and Brusco (2011): recommendations and cautions.

I discuss the recommendations and cautions in Steinley and Brusco's (2011) article on the use of finite models to cluster a data set. In their article, much use is made of comparison with the K-means procedure. As noted by researchers for over 30 years, the K-means procedure can be viewed as a special case of finite mixture modeling in which the components are in equal (fixed) proportions and a...

متن کامل

Choosing the number of clusters in Κ-means clustering.

Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple clusters, the procedure is extended to determine the ...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Decentralisation – A Portmanteau Concept That Promises Much but Fails to Deliver?; Comment on “Decentralisation of Health Services in Fiji: A Decision Space Analysis”

Decentralisation has been described as an empty concept that lacks clarity. Yet there is an enduring interest in the process of decentralisation within health systems and public services more generally. Many claims about the benefits of decentralisation are not supported by evidence. It may be useful as an organising framework for analysis of health systems but in this context it lacks conceptu...

متن کامل

Data Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach

Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Psychological methods

دوره 16 1  شماره 

صفحات  -

تاریخ انتشار 2011